Merged Apache bug fixes by markhamstra · Pull Request #100 · alteryx/spark

markhamstra · 2015-10-09T02:02:42Z

No description provided.

The UTF8String may come from UnsafeRow, then underline buffer of it is not copied, so we should clone it in order to hold it in Stats. cc yhuai Author: Davies Liu <davies@databricks.com> Closes apache#8929 from davies/pushdown_string. (cherry picked from commit ea02e55) Signed-off-by: Yin Huai <yhuai@databricks.com>

In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree. The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that. Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way. The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing. Author: Sean Owen <sowen@cloudera.com> Closes apache#8919 from srowen/SPARK-10833. (cherry picked from commit bf4199e) Signed-off-by: Sean Owen <sowen@cloudera.com>

…AllocationSuite Fix the following issues in StandaloneDynamicAllocationSuite: 1. It should not assume master and workers start in order 2. It should not assume master and workers get ready at once 3. It should not assume the application is already registered with master after creating SparkContext 4. It should not access Master.app and idToApp which are not thread safe The changes includes: * Use `eventually` to wait until master and workers are ready to fix 1 and 2 * Use `eventually` to wait until the application is registered with master to fix 3 * Use `askWithRetry[MasterStateResponse](RequestMasterState)` to get the application info to fix 4 Author: zsxwing <zsxwing@gmail.com> Closes apache#8914 from zsxwing/fix-StandaloneDynamicAllocationSuite. (cherry picked from commit dba95ea) Signed-off-by: Andrew Or <andrew@databricks.com>

Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes apache#8939 from ryan-williams/errmsg. (cherry picked from commit b7ad54e) Signed-off-by: Andrew Or <andrew@databricks.com>

…Suite Fixed the test failure here: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ This failure is because `HeartbeatReceiverSuite. heartbeatReceiver` may receive `SparkListenerExecutorAdded("driver")` sent from [LocalBackend](https://github.com/apache/spark/blob/8fb3a65cbb714120d612e58ef9d12b0521a83260/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala#L121). There are other race conditions in `HeartbeatReceiverSuite` because `HeartbeatReceiver.onExecutorAdded` and `HeartbeatReceiver.onExecutorRemoved` are asynchronous. This PR also fixed them. Author: zsxwing <zsxwing@gmail.com> Closes apache#8946 from zsxwing/SPARK-10058. (cherry picked from commit 9b3e776) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with. Author: felixcheung <felixcheung_m@hotmail.com> Closes apache#8961 from felixcheung/rselect. (cherry picked from commit 721e8b5) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

I don't believe the API changed at all. Author: Avrohom Katz <iambpentameter@gmail.com> Closes apache#8957 from akatz/kcl-upgrade. (cherry picked from commit 883bd8f) Signed-off-by: Sean Owen <sowen@cloudera.com>

`Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not. Author: Wenchen Fan <cloud0fan@163.com> Closes apache#8987 from cloud-fan/hash.

This should go into 1.5.2 also. The issue is we were no longer adding the __app__.jar to the system classpath. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@yahoo-inc.com> Closes apache#8959 from tgravescs/SPARK-10901. (cherry picked from commit e978360) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

This PR implements the following features for both `master` and `branch-1.5`. 1. Display the failed output op count in the batch list 2. Display the failure reason of output op in the batch detail page Screenshots: <img width="1356" alt="1" src="https://cloud.githubusercontent.com/assets/1000778/10198387/5b2b97ec-67ce-11e5-81c2-f818b9d2f3ad.png"> <img width="1356" alt="2" src="https://cloud.githubusercontent.com/assets/1000778/10198388/5b76ac14-67ce-11e5-8c8b-de2683c5b485.png"> There are still two remaining problems in the UI. 1. If an output operation doesn't run any spark job, we cannot get the its duration since now it's the sum of all jobs' durations. 2. If an output operation doesn't run any spark job, we cannot get the description since it's the latest job's call site. We need to add new `StreamingListenerEvent` about output operations to fix them. So I'd like to fix them only for `master` in another PR. Author: zsxwing <zsxwing@gmail.com> Closes apache#8950 from zsxwing/batch-failure. (cherry picked from commit ffe6831) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

Currently if it isn't set it scans `/lib/*` and adds every dir to the classpath which makes the env too large and every command called afterwords fails. Author: Kevin Cox <kevincox@kevincox.ca> Closes apache#8994 from kevincox/kevincox-only-add-hive-to-classpath-if-var-is-set.

The created decimal is wrong if using `Decimal(unscaled, precision, scale)` with unscaled > 1e18 and and precision > 18 and scale > 0. This bug exists since the beginning. Author: Davies Liu <davies@databricks.com> Closes apache#9014 from davies/fix_decimal. (cherry picked from commit 37526ac) Signed-off-by: Davies Liu <davies.liu@gmail.com>

…ifferent Oops size. UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM). To reproduce, launch Spark using MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops" And then run the following scala> sql("select 1 xx").collect() Author: Reynold Xin <rxin@databricks.com> Closes apache#9030 from rxin/SPARK-10914. (cherry picked from commit 84ea287) Signed-off-by: Reynold Xin <rxin@databricks.com>

…eaming applications Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled. Author: Hari Shreedharan <hshreedharan@apache.org> Closes apache#8998 from harishreedharan/ss-log-error and squashes the following commits: 462b264 [Hari Shreedharan] Improve log message. 2733d94 [Hari Shreedharan] Minor change to warning message. eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled. 725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it. b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled. a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications. (cherry picked from commit 0984129) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

…rain with given regParam and convergenceTol parameters These params were being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training. Same with StreamingLinearRegressionWithSGD. I added the params as named arguments to the call and also fixed the intercept parameter, which was being passed as regularization value. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes apache#9002 from BryanCutler/StreamingSGD-convergenceTol-bug-10959. (cherry picked from commit 5410747) Signed-off-by: Xiangrui Meng <meng@databricks.com>

…n on Aggregate For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#8548 from cloud-fan/support-order-by-non-attribute.

yeweizhang · 2015-10-09T16:52:59Z

Can we also pull this fix?

https://issues.apache.org/jira/browse/SPARK-10389

This will fix the 100+ failure we ran into when comparing the native and sparkSQL resutls. Thank you.

markhamstra · 2015-10-09T16:53:26Z

Already did.

Merged Apache bug fixes

Davies Liu and others added 15 commits September 28, 2015 14:40

[SPARK-10871] include number of executor failures in error msg

3b23873

Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes apache#8939 from ryan-williams/errmsg. (cherry picked from commit b7ad54e) Signed-off-by: Andrew Or <andrew@databricks.com>

[SPARK-10889] [STREAMING] Bump KCL to add MillisBehindLatest metric

d323e5e

I don't believe the API changed at all. Author: Avrohom Katz <iambpentameter@gmail.com> Closes apache#8957 from akatz/kcl-upgrade. (cherry picked from commit 883bd8f) Signed-off-by: Sean Owen <sowen@cloudera.com>

[SPARK-10934] [SQL] handle hashCode of unsafe array correctly

c8392cd

`Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not. Author: Wenchen Fan <cloud0fan@163.com> Closes apache#8987 from cloud-fan/hash.

Merge branch 'branch-1.5' of github.com:apache/spark into csd-1.5

a3b4b93

markhamstra assigned mbautin Oct 9, 2015

BryanCutler and others added 3 commits October 8, 2015 22:23

Merge branch 'branch-1.5' of github.com:apache/spark into csd-1.5

9a625f3

markhamstra added a commit that referenced this pull request Oct 9, 2015

Merge pull request #100 from markhamstra/csd-1.5

ce28740

Merged Apache bug fixes

markhamstra merged commit ce28740 into alteryx:csd-1.5 Oct 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merged Apache bug fixes#100

Merged Apache bug fixes#100
markhamstra merged 18 commits intoalteryx:csd-1.5from
markhamstra:csd-1.5

markhamstra commented Oct 9, 2015

Uh oh!

yeweizhang commented Oct 9, 2015

Uh oh!

markhamstra commented Oct 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

markhamstra commented Oct 9, 2015

Uh oh!

yeweizhang commented Oct 9, 2015

Uh oh!

markhamstra commented Oct 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants